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Summary 


The coronavirus nucleocapsid (N) protein packages vi- 
ral genomic RNA into a ribonucleoprotein complex. In- 
teractions between N proteins and RNA are thus crucial 
for the assembly of infectious virus particles. The 
45 kDa recombinant nucleocapsid N protein of corona- 
virus infectious bronchitis virus (IBV) is highly sensi- 
tive to proteolysis. We obtained a stable fragment of 
14.7 kDa spanning its N-terminal residues 29-160 
(IBV-N29-160). Like the N-terminal RNA binding do- 
main (SARS-N45-181) of the severe acute respiratory 
syndrome virus (SARS-CoV) N protein, the crystal 
structure of the IBV-N29-160 fragment at 1.85 A resolu- 
tion reveals a protein core composed of a five-stranded 
antiparallel B sheet with a positively charged 6 hairpin 
extension and a hydrophobic platform that are proba- 
bly involved in RNA binding. Crosslinking studies 
demonstrate the formation of dimers, tetramers, and 
higher multimers of IBV-N. A model for coronavirus 
shell formation is proposed in which dimerization of 
the C-terminal domain of IBV-N leads to oligomeriza- 
tion of the IBV-nucleocapsid protein and viral RNA con- 
densation. 


Introduction 


Coronaviruses are large enveloped single-stranded 
RNA viruses of positive polarity which cause a wide 
spectrum of diseases affecting humans and animals (re- 
viewed in Lai and Holmes, 2001). In 2003, the causative 
agent for the outbreak of atypical pneumonia with a high 
fatality rate was identified as the severe acute respira- 
tory syndrome (SARS) coronavirus (SARS-CoV) (Peiris 
et al., 2003), and its genome was rapidly sequenced 
and characterized (Marra et al., 2003; Rota et al., 2003). 
The potential risks for public health posed by SARS- 
CoV and the current lack of specific antiviral agents or 
vaccines against this emerging pathogen have triggered 
a global research effort in order to characterize this fam- 
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ily of viruses at the molecular level. Coronavirus infec- 
tious bronchitis virus (IBV) causes an acute and conta- 
gious disease in chickens, with a significant impact on 
the poultry industry worldwide. 

In structural terms, coronavirus virions are roughly 
spherical, with an approximate diameter of 120 nm. 
Their detailed in vivo morphology is still a matter of de- 
bate but might be composed of three structural layers: 
a lipid envelope with three or four glycoproteins, a pro- 
tein core, and a tubular or helicoidal nucleocapsid, as 
shown for the porcine transmissible gastroenteritis virus 
(TGEV) (Escors et al., 2001). Low-resolution electron mi- 
crographs have highlighted the crown-like structure that 
surrounds the coronavirus envelope (Sturman et al., 
1980). These spikes contain the S protein, a class | fu- 
sion glycoprotein (Bosch et al., 2004; Lescar et al., 
2001) which is also responsible for binding to the recep- 
tor (Lai and Holmes, 2001). Two integral membrane pro- 
teins, M (about 230 amino acids) and E (about 100 amino 
acids), are essential for the maturation of newly formed 
virions, and are sufficient for the formation of a closed 
viral particle (Vennema et al., 1996). The M protein is 
thought to possess three transmembrane segments 
and a large C-terminal endodomain that interacts with 
the nucleocapsid and possibly also with the RNA ge- 
nome (Sturman et al., 1980; Kou and Masters, 2002; Nar- 
ayanan et al., 2003). The nucleocapsid protein of IBV 
(IBV-N) is a phosphoprotein of 409 amino acids that is 
well-conserved across various IBV strains (Williams 
et al., 1992) and is also important for cell-mediated im- 
munity. It forms a protective shell that packages the viral 
genomic RNA of 27.6 kb and is also thought to partici- 
pate in viral RNA replication and transcription. Specific 
packaging of viral genetic material is usually performed 
via the recognition of a particular nucleotide sequence 
by a nucleocapsid protein. Such “packaging signals” 
have been identified at the 3’ end of the viral genomes 
of mouse hepatitis virus (MHV) (Fosmire et al., 1992) 
and bovine coronavirus (BCV) (Cologna and Hogue, 
2000) and at the 5’ end of the TGEV genome (Escors 
et al., 2003), but not unambiguously for the IBV genome. 
In elegant structural studies performed in other viral 
families with RNA genomes, such as HIV (De Guzman 
et al., 1998) and the MS2 bacteriophage (Valegard 
et al., 1997), the packaging signals were seen to form 
a stem-loop structure that is recognized by the nucle- 
ocapsid protein. In the case of the IBV genome, this 
special RNA structure has not been determined with 
certainty, although previous studies demonstrated that 
the IBV-N protein interacts specifically with RNA se- 
quences located at the 3’ noncoding region of the viral 
genome (Zhou et al., 1996). Both the N- and C-terminal 
domains of IBV-N, but not its middle region, bind to an 
oligoribonucleotide of 155 nucleotides, located at the 
3’ end of the viral genome nontranslated region, but little 
is known about the details of this interaction and how it 
relates to virus assembly (Zhou and Collisson, 2000). 

In an attempt to define how the viral genome is incor- 
porated into newly formed viral particles and how this 
process is coupled with nucleocapsid assembly, we 
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have undertaken functional and structural studies using 
the full-length N protein from IBV expressed in bacteria. 
The full-length recombinant IBV-N protein expressed in 
Escherichia coli is unstable. Through cleavage by E. coli 
proteases, a stable fragment of 14.7 kDa comprising its 
N-terminal residues 29-160 can be obtained. We report 
here the crystal structure of this N-terminal fragment re- 
fined at 1.85 A resolution and compare it to the N-terminal 
RNA binding domain of the SARS-CoV N protein (SARS- 
N45-181), which was solved recently using NMR (Huang 
et al., 2004). We demonstrate the formation of multimers 
of the IBV-N protein in vitro and propose that dimers and 
possibly tetramers of IBV-N, which are stabilized pre- 
dominantly via dimerization of their C-terminal domains, 
act as elementary building blocks for RNA genome con- 
densation and nucleocapsid assembly. 


Results and Discussion 


Identification of a Stable Proteolytic Fragment 

of IBV-N 

The full-length IBV-N protein comprising 409 residues 
was expressed in E. coli in a soluble form and purified 
as described in the Experimental Procedures. Crystalli- 
zation trials with the full-length protein produced crys- 
tals that grew from a precipitate after about 3 months. 
Analysis of dissolved crystals using SDS-PAGE reveals 
that they contain a fragment of the full-length protein 
of about 14.7 kDa (Figure 1). A domain of similar size 
could be obtained by incubating the IBV-N protein at 
room temperature for the same period (Figure 1). Thus, 
this polypeptide fragment presumably derives from 
slow proteolysis of IBV-N by traces of E. coli proteases 
present in the crystallization solution. In order to identify 
its nature, this proteolytically stable fragment was sub- 
jected to mass spectrometry, which revealed a mass 
of 14,692 Da. N-terminal amino acid sequencing identi- 
fied residues Ser-Ser-Gly-Asn-Ala-Ser-Trp, which are 
located at positions 29-35 of the IBV-N amino acid se- 
quence. Given that Ser-29 is the first amino acid of the 
fragment, the closest mapping onto the sequence gives 
Leu-160 as the C-terminal residue (calculated mass 
14,691 Da). The IBV-N29-160 protein shares 37% amino 
acid sequence identity with the N-terminal RNA binding 
domain of a comparable domain from the SARS-CoV N 
protein (SARS-N45-181), whose structure was reported 
recently (Figure 2). 


Structure Determination and Quality of the Model 

Overexpression of the recombinant N-terminal IBV-N29- 
160 fragment readily gave crystals diffracting beyond 
2.0A (Figure 1). Attempts to solve the structure by mo- 
lecular replacement using the averaged NMR structure 
of aSARS-CoV nucleocapsid N-terminal domain depos- 
ited in the Protein Data Bank (PDB) (Huang et al., 2004) 
were unsuccessful, even though the two structures 
turned out to adopt a related fold (Figure 2). The IBV- 
N29-160 protein is devoid of methionine and cysteine res- 
idues. Thus, in order to assist structure determination 
using the multiwavelength anomalous dispersion (MAD) 
method, lle-62, Leu-104, and Val-116 were mutated to 
methionine. These hydrophobic amino acid residues 
have been shown to introduce little perturbation in the na- 
tive protein structure when substituted by methionine 
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Figure 1. Structural Domains of the IBV-N Protein 


(A) Schematic representation of the IBV-N protein depicting its var- 
ious domains and clustering of positive charges, as inferred from the 
present and other studies. 

(B) SDS-PAGE analysis of the full-length recombinant IBV-N protein 
of 44.9 kDa (lane 1, arrow) and the N-terminal proteolytically stable 
fragment of 14.7 kDa spanning residues 29-160 of the sequence 
which was crystallized (lane 2). The recombinant IBV-N29-160 is 
shown in lane 3. 

(C) Typical plate-shaped crystals of the recombinant IBV-N29-160 
protein. 


residues (Gassner and Matthews, 1999). In addition, the 
presumably exposed residue Lys-85 (as suggested by 
an amino acid sequence alignment with the SARS-CoV 
N protein) was mutated to Cys in order to introduce a po- 
tential binding site for mercury compounds. This mutated 
fragment of IBV-N29-160 was used for structure determi- 
nation using the MAD method with crystals containing the 
selenomethionyl protein. Data collection, phasing, and 
refinement statistics are summarized in Tables 1 and 2 
for the selenomethionine-derivatized crystal (SeMet) and 
for the native protein crystal. Overall, the path of the 
main chain is unambiguously defined in clear electron 
density for the two IBV-N29-160 molecules present in 
the asymmetric unit in each crystal form. A total of 134 
protein residues per molecule (two extra residues at 
the N terminus derive from the cloning procedure) 
were included in the final models, which have excellent 
stereochemical parameters as well as 182 and 188 well- 
defined water molecules, respectively (Table 2). Electron 
density is absent for the Lys-81 side chain which is ex- 
posed to the solvent. 


Overall Structure 

The two monomers present in the asymmetric unit can 
be superimposed with a root mean square (rms) devia- 
tion of 0.5 A for their main chain atoms. The IBV-N29- 
160 monomer has approximate overall dimensions of 
35 A x 35 A x 30 A and consists of a core formed by 
a five-stranded antiparallel 8 sheet with the topology 
Ba-Bo-B3-B1-Bs, which faces a smaller antiparallel sheet 
composed of only two strands, B1/-84, which are absent 
in the SARS-N protein (Figure 2). A long flexible hairpin 
loop Ba-B3, which is inserted between the Bs and £3 
strands, protrudes largely from the protein core. This 
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Figure 2. Overall Fold of the IBV-N Protein 
(A and B) Comparison of the folds adopted by IBV-N29-160 ([A]; shown as a stereoview, top) and the N-terminal domain of the SARS-CoV nu- 
cleocapsid protein (B) (Huang et al., 2004). The two proteins are displayed in the same orientation. Secondary structure elements and some res- 
idue numbers are indicated. 
(C) Topology diagram of the IBV-N29-160 protein. Its N- and C-terminal ends are labeled. 


extension is mobile, as shown by higher than average 
temperature factors, and contains several basic resi- 
dues which are conserved across various coronavirus 
N protein sequences (Figure 3). Extended loops span- 
ning up to 30 residues connect the various secondary 
structure elements, presumably introducing flexibility 
to the overall architecture. This potential adaptability 
to various structural contexts might be important for as- 
sembly and disassembly of the nucleocapsid during the 
virus life cycle. The overall fold is similar to the SARS- 
N45-181 protein (Figure 2) with a few structural differen- 
ces, such as the presence of a short 31, helix connecting 
strands £1 and Bos. Overall, a three-dimensional struc- 
tural alignment between the SARS-CoV and IBV nucleo- 
capsid N-terminal domains using the program DALI 
(Holm and Sander, 1993) shows that a total of 124 equiv- 
alent Ca atoms can be superimposed, with an rms devi- 
ation of 3.0 A. The Z score is 10.4, confirming the global 
similarity of the two folds. The rather large difference be- 
tween the SARS-CoV and IBV nucleocapsid N-terminal 
domain structures accounts for the failure of molecular 
replacement procedures to solve the latter structure us- 
ing the former as a model. The important structural dif- 
ferences we observe between the SARS-CoV and IBV 
nucleocapsid N-terminal domain structures may stem 
from an inherent mobility of the coronavirus nucleocap- 
sid structure or from a large uncertainty of the atomic 


positions determined by NMR, or both. A search through 
the PDB did not return any other protein with a statisti- 
cally significant Z score, emphasizing the uniqueness 
of this fold as noted by Huang et al. (2004). 


Dimer Formation 

In our crystal structure, the two monomers assemble 
into a butterfly-shaped dimer related by a 180° rotation, 
burying in this interaction an accessible surface area of 
560 A2. The transformation is not a pure rotation, as a re- 
sidual translation is needed to bring the two monomers 
into coincidence. The relatively small surface area sug- 
gests a rather weak binding affinity, an observation in 
agreement with the fact that, using size exclusion chro- 
matography, the recombinant IBV-N29-160 protein pre- 
dominantly elutes as amonomer (see below). This is also 
consistent with our findings of a different dimeric inter- 
face adopted by the same recombinant IBV-N29-160 
protein in a nonrelated crystal form (with space group 
C2) that diffracts only to medium resolution. 


Nucleic Acid Binding 

In order to package the viral genome of 27.6 kb, the 
IBV-N protein must provide extended surfaces to bind 
the viral RNA genome both specifically and nonspe- 
cifically (without a requirement for a special base se- 
quence). N- and C-terminal regions of IBV-N 
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Table 1. Crystallographic Data Collection and Phasing Statistics 


Data Set Native SeMet 

IBV-N: 29-160 IBV-N: 29-160 (Three Residues Mutated to Met) 

Peak Inflection Remote 

Wavelength 1.5418 0.97943 0.97956 0.98729 
Cell parameters (A, °), P1 a = 35.48 a = 34.77 

b = 35.72 b=:35,37 

c = 56.11 c = 55.95 

a = 99.05 a = 100.51 

B = 93.93 6B = 95.48 

y = 109.53 y = 110.16 
Resolution (A) 20-1.85 20-1.95 
Total number of reflections 75,798 76,265 64,999 72,832 
No. of unique reflection 20,031 20,083 17,032 19,684 
Completeness (%)* 92.4 (88.8) 96.6 (95.0) 96.5 (95.2) 95.6 (87.8) 
Multiplicity” 3.8 (3.7) 3.8 (3.7) 3.8 (3.6) 3.7 (3.5) 
Amerge 0.064 (0.625) 0.05 (0.118) 0.05 (0.131) 0.06 (0.177) 
I/o(I) 7.4 (1.1) 8.6 (3.7) 8.3 (4.4) 9.1 (6.0) 
Solvent content (%) 43.3 40.6 
No. of Se sites — 6 
Phasing power" — 0.7/0.6 0.6/0.4 0.2/1.1 
f'/ f”° — —8.1/5.7 —10.5/3.3 —4.3/0.5 
Figure of merit — 0.61/0.793 
20-2.5 A 


“The numbers in parentheses refers to the last (highest) resolution shell. 
> For the SeMet crystal, Friedel pairs are treated as different reflections. 


© Rmerge = 2h2illni — <In>|/2h,i Ini, Where Ip; is the ith observation of the reflection h, while <I,> is its mean intensity. 
4 Anomalous phasing power/dispersive phasing power, where anomalous phasing power is |“F,,| — |F_»|/anomalous lack of closure and dis- 


persive phasing power is |“F,| — |“F,,|/dispersive lack of closure. 


°Values of f’ and f” where estimated from a scan of the absorption edge using the program CHOOCH (Evans and Pettifer, 2001). 
‘Figures of merit are given before and after real space density modification, respectively. 


encompassing residues 1-171 and 268-407, respec- 
tively, interact with noncoding regions of the viral geno- 
mic RNA located at its 3’ end (Zhou and Collisson, 2000). 


Table 2. Refinement Statistics 


Native SeMet 


Resolution range (A) 19.92-1.85  20.0-1.95 


As the fragment 1-91 does not bind RNA, residues be- 
tween 91 and 171 were proposed to either make direct 
contacts with RNA or be necessary for the integrity of 
the protein structure (Zhou and Collisson, 2000). Be- 
cause the segment 92-95 includes strictly conserved 
hydrophobic residues which are buried in the protein 
core in our structure, we propose that the fragment 1- 
91 studied by Zhou and Collisson (2000) was probably 
poorly folded and thus nonactive. We tested nucleic 


Intensity cutoff (F/o(F)) none none acid binding by IBV-N29-160 and found that the re- 
No. of reflections: completeness (%) — 100.0 a6 combinant fragment was able to bind an oligoribonu- 
Seo nomen. Lite ne Oer cleotide from the 3’ end of the viral genome (Figure 4). 
Used for Riree calculation 1,026 881 Thi It isi th dice av fi ; 
No. of nonhydrogen atoms Is result Is in agreement with studies by Huang et al. 
Protein 2130 2128 (2004), who used NMR to demonstrate that SARS-CoV 
Water molecules 188 182 N45-181 could bind a 32-mer oligoribonucleotide lo- 
R factor (%)" 22.96 22.73 cated at the 3’ end of the SARS-CoV genome. Interest- 
Riree (%) a 27.08 et28 ingly, this oligoribonucleotide has a highly conserved 
Rms deviations from ideality sequence across various coronaviruses including IBV, 
Bond lengths (A) 0.007 0.008 Ad Aa : . Rob 
Bond angles (°) 1.05 114 and adopts a unique tertiary structure (Robertson 


Ramachandran Plot 


et al., 2005). A surface representation of electrostatic 
charges of the IBV-N29-160 protein shown in Figure 5 re- 


Residues in most favored 88.8 90.3 veals a striking segregation in the charge distribution on 
regions (%) the protein surface. The Bo-B3 hairpin forms a basic 
pres : ai enous ve we patch at the thumb, whereas the base is acidic (Figure 5). 
oO 
Residues in generously allowed 1.0 0.5 These two charged patches are Separated bya neutral 
regions (%) and rather hydrophobic platform contributed by resi- 
Overall G factor® 0.10 0.04 dues projecting from strands £4-Bo-B3 that form a palm- 
PDB accession code 2BXX 2BTL like structure. An alignment of nucleocapsid protein 


*R factor = > ||Fops| — |Featel|/= |Fobs|- 

> Riree Was calculated with 5% of reflections excluded from the 
whole refinement procedure. 

°G factor is the overall measure of structure quality from 
PROCHECK (Laskowski et al., 1993). 


amino acid sequences from various coronaviruses high- 
lights the conservation of several residues exposed at 
the protein surface, suggesting that some might play 
a role in nucleic acid recognition (Figures 3 and 5). The 
topology of the protein and its charge distribution 
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Figure 3. Structure-Based Alignment of 
Coronavirus Nucleocapsid Amino Acid Se- 
quences Corresponding to the Proteolytically 
Stable N-Terminal Fragment 


Bl Bi’ 31 «2 
—S_ —— > 


— 
IBV 29: SSGNASWEQAIKAKKLNT PP PK FEGSGVBDNENI KPSQOHGYW :71 

H-CoV 58: TIPHYSWESGITOFOKGRDFKFSDGQOGVPIAFGVPPSEAKGYW :100 
MHV 62: VV PHYSWESGITOFOKGKE FOFAQGQOGVPIANGI PASEQKGYW :104 
TGEV 29:NNIPLSFENPITLQQOGSKFWNLCPRDFVPKGIG-NRDQOIGYW : 709 
SARS 47: PNNTASWETALTQHGK-EELRFPRGQGVPINTNSGPDDOIGYY : gg 


Secondary structure elements are labeled 
above the sequence for IBV-N29-160 and be- 
low for the SARS-CoV N-terminal fragment 


B1 B2 (Huang et al., 2004). Sequences of IBV (infec- 
‘ . tious bronchitis virus, strain Beaudette, 
B2 B2 B3 B3 NP_040838); H-CoV (human coronavirus, 


—r > 

IBV 72: RR--QARFKPGKGGRKPVPDAWY FYYTGTGPAADLNWGDTQDG :112 

H-CoV 101:YRHSRRSFKTADGQOKOLLPRWY FYYLGTGPYANASYGESLEG : 143 

MHV 105: YRHNRRSFKT PDGQOQKOLLPRWY FYYLGTGPHAGAEYGDDIDG : 147 

TGEV 71:NR--QTRYRMVKGORKELPERWFEFYYLGTGPHADAKFKDKLDG : 111 

SARS 89: RRATRR-VRGGDGKMKELS PRWYFYYLGTGPEASLPYGANKEG : 130 
——> 


strain HKU1, YP_173242); MHV (murine hepa- 
titis virus, strain 1, AAA46439); TGEV (porcine 
transmissible gastroenteritis virus, strain 
RM4, AAG30228); and SARS (SARS-CoV, 
1SSK_A) were obtained from GenBank. Con- 


se Be" 3” “a served residues are shaded. 
B4 B4° BS 
~~ =P 


IBV _—113: IVWVAAKGADTKSRSNOQGTRDPDKFDOYPLRESDG--GPDGNF' : 153 
H-CoV 144:VEWVANHQADTSTPSDVSSRDPTTOQEAIPTREPPGTILPQGYY : 186 
MHV ‘148: VVWWVASQQADTKTTADIVERDPSSHEAIPTREAPGTVLPQGFY : 190 
TGEV  112:VVWVAKDGAMNKP-TTLGSRGANN-ESKALKEDGKVPGEFQLE : 152 
SARS 131:[|VWVATEGALNT PKDHIGTRNPNNNAATVLOLPQGTTLPKGFY : 173 


7 


B4 


BS 
IBV 154:RWDFIPL : 160 
H-CoV 187:VEGS-GR_ : 192 
MHV 191: VEGS-GR_ : 196 
TGEV 153: VNOQS--- : 156 
SARS 174:AEGSRGG : 180 


suggest a mode of RNA binding in which its phosphate 
groups would project toward the basic B2-B3 hairpin, 
possibly making electrostatic interactions with the con- 
served positively charged Arg-76 and Lys-78 residues, 
while the sugar and base moieties would contact the 
hydrophobic platform. In this model, the exposed hy- 
drophobic residues Tyr-92 and Tyr-94 (strand 63) could 
form stacking interactions with the bases, as was ob- 
served, for instance, in complexes between the vaccinia 
virus protein VP39 and mRNA (Hu et al., 1999) or be- 
tween the matrix protein VP40 from Ebola virus and a tri- 
ribonucleotide (Gomis-Ruth et al., 2003). As suggested 
by Huang et al. (2004), additional favorable interactions 
might be formed upon closure of the flexible Bo-B3 hair- 
pin onto the incoming RNA ligand. 


In Vitro Oligomerization of the IBV 

Nucleocapsid Protein 

Oligomerization of N protein has been studied in MHV 
(Robbins et al., 1986) and SARS-CoV (He et al., 2004). 
Trimers of N subunits linked by intermolecular disulfide 
bonds were identified in MHV. Using mutational analy- 
sis, the Ser/Arg-rich motif spanning residues 184-196 
(immediately downstream from our crystallized frag- 
ment) was shown to be essential for the multimerization 
of the N protein from SARS-CoV (He et al., 2004). We an- 
alyzed the oligomerization states of the full-length IBV-N 
protein, IBV-N29-160, and IBV-N218-329 in solution. 
Crosslinking experiments were performed using glutar- 
aldehyde, a short self-polymerizing reagent mostly re- 
acting with the amino and amine groups of lysine and 
histidine, respectively (Buehler et al., 2005), and suberic 
acid bis N-hydroxy-succinamide ester (SAB), a reagent 
which only crosslinks lysine residues at larger distances. 
Concentrations of crosslinking agent higher than 


Bs 


0.1 mM led to the formation of dimers, tetramers (but 
not trimers), and larger oligomers of IBV-N, along with 
the disappearance of monomeric species (Figure 6). 
By contrast, an approximately 20-fold higher concentra- 
tion of crosslinking agent (2 mM glutaraldehyde or 1 mM 
SAB; see Figure 6) was required to obtain equal amounts 


Figure 4. Analysis of the RNA Binding Activity of the Full-Length 
IBV-N Protein and IBV-N29-160 and IBV-N218-329 Fragments 


The purified IBV-N (lanes 2 and 7), IBV-N29-160 (lanes 3 and 8), IBV- 
N218-329 (lanes 4 and 9), His-tagged IBV-N29-160 (lanes 5 and 10), 
and GST (negative control, lanes 6 and 11) were separated on a 15% 
SDS-PAGE gel. The proteins were either visualized by Coomassie 
brilliant blue staining (lanes 1-6) or transferred to Hybond C extra 
membrane (Amersham) and detected by Northwestern blot with a 
digoxin-labeled RNA probe corresponding to the IBV genome 
sequence from nucleotides 26,539-27,608 (lanes 7-11). Molecular 
masses of standard proteins are indicated. 
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Figure 5. Proposed RNA Binding Site of IBV-N 


(A) Surface representation of the IBV-N29-160 fragment with electro- 
static potentials colored in blue (positive) and red (negative). Resi- 
dues which are suggested to participate in RNA binding are labeled. 
The N- and C-terminal ends of the polypeptide chains are indicated. 
(B) Close-up view of the proposed RNA binding site of the IBV-N29- 
160 fragment. The Ca trace of IBV-N29-160 is displayed. Side chains 
which are likely to participate in nucleic acid binding are shown as 
sticks. 


of monomers and dimers of IBV-N29-160. This suggests 
that regions within the C-terminal domain of IBV-N make 
a predominant contribution to the multimerization of 
IBV-N. Secondary structure predictions and limited pro- 
teolysis studies of the IBV-N protein suggest the pres- 
ence of a structured—possibly a-helical—C-terminal 
domain of about 12 kDa, which is connected to IBV- 
N29-160 by a Ser/Arg/Ala/Gly-rich loop of approxi- 
mately 50 amino acid residues (Figure 1). We expressed 
such a stable recombinant C-terminal domain encom- 
passing residues 218-329 of the N protein in a soluble 
form. This C-terminal domain can bind RNA (Figure 4), 
fold independently, and was recently crystallized (H.F., 
D.X.L., and J.L., unpublished data). Crosslinking experi- 
ments show that IBV-N218-329 forms dimers, trimers, 
tetramers, and higher oligomers for concentrations of 
crosslinking agent higher than 1 mM with a concomitant 
decrease in monomer species, thus confirming the im- 
portant contribution of the C-terminal domain of IBV to 
the formation of IBV-N multimers (Figure 6). As an inde- 
pendent confirmation, we subjected the IBV-N29-160 
and IBV-N218-329 domains to size exclusion chro- 
matography (Figure 7). Under these conditions, the 
C-terminal domain IBV-N218-329 elutes faster than the 
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Figure 6. Crosslinking Experiments 


(A) Full-length IBV-N protein. 

(B) IBV-N29-160 protein, which was crystallized. 

(C) C-terminal fragment, IBV-N218-329. 

The nature and concentrations of crosslinking agent are shown. 
Monomer, dimer, trimer, and tetramer species of the recombinant 
proteins are indicated. 


N-terminal domain as a sharp symmetric peak corre- 
sponding to a dimer. The N-terminal domain elutes at 
a position intermediate between a monomer and a dimer 
(with an estimated size corresponding to a protein of mo- 
lecular weight 18.1 kDa). This pattern of migration could 
be due to the asymmetric shape of the IBV N-terminal 
domain or could be indicative of the presence of a mix- 
ture of monomer and dimer of the N-terminal domain in 
solution. 


Implications for Coronavirus 

Nucleocapsid Assembly 

Our data suggest that residues 218-329 at the 
C-terminal end of the IBV-N protein play a major role 
for its multimerization. This is consistent with results 
reported by Surjit et al. (2004), who studied SARS-CoV 
nucleocapsid dimerization using the yeast two-hybrid 
system, and points to conserved assembly properties 
between the SARS-CoV and IBV in spite of significant 
amino acid differences between their two nucleocapsid 
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Figure 7. Size Exclusion Chromatography Elution Profiles of IBV- 
N29-160 and IBV-N218-329 

The vertical axis shows absorbance at 280 nm. The horizontal axis 
indicates the elution volume in milliliters. Three thin vertical lines in- 
dicate the positions of molecular weight of protein standards (from 
left to right: ovalbumin, 43 kDa; chymotrypsinogen A, 25.0 kDa; 
and ribonuclease A, 13.7 kDa). The large difference in absorbance 
stems from the different individual molar absorbance coefficients 
at 280 nm of IBV-N29-160 (40,540 M~'cm7') and IBV-N218-329 
(4,080 M~'cm~'). 


proteins. Can we ascribe a function to multimer forma- 
tion by the N protein? One obvious explanation is that 
multimerization increases the protein surface area ac- 
cessible for binding the viral genomic RNA, thus provid- 


A IBV-N29-160 


ing the elementary building block for nucleocapsid as- 
sembly. Indeed, several crystal structures of capsid 
proteins have revealed the presence of multimers that 
present continuous patches of basic residues at their 
surface: the capsid proteins of West Nile virus and Borna 
disease virus form tetrameric assemblies (Dokland et al., 
2004; Rudolph et al., 2003) and the nucleocapsid protein 
of porcine respiratory syndrome virus, an arterivirus, 
forms dimers (Doan and Dokland, 2003). Unfortunately, 
because these structures were determined in the ab- 
sence of an RNA ligand, it is difficult to evaluate to 
what extent multimer formation is coupled with nucleic 
acid recognition. In the Arteviridae, a viral family 
genomically related to the coronaviruses, the basic 
N-terminal half of the nucleocapsid protein is involved 
in RNA binding while its C-terminal domain forms a tight 
dimer (Doan and Dokland, 2003). 

Further complexity for the study of coronavirus nucle- 
ocapsid assembly stems from its interaction with the 
M protein endodomain (Kou and Masters, 2002; Nar- 
ayanan et al., 2000, 2003) and from the fact that several 
coronavirus proteins can interact with single-stranded 
RNA, including the nsp9 replicase protein from SARS- 
CoV (Egloff et al., 2004; Sutton et al., 2004). In the ab- 
sence of a nucleic acid ligand, the N protein appears 
to be composed of two main globular domains loosely 
connected by Arg/Ser/Ala/Gly-rich loops that are highly 
sensitive to proteolysis. These connecting regions may 
undergo modifications (e.g., phosphorylation) that could 
influence the multimerization state of the protein and 
control its interaction with RNA. In a recent report, su- 
moylation of Lys-62 of SARS-CoV N protein expressed 
in mammalian cells was proposed to promote dimeriza- 
tion of the protein (Li et al., 2005). It is not known whether 
similar modifications of the IBV-N occur in virus-infected 
cells. Nevertheless, a working hypothesis for coronavi- 
rus nucleocapsid formation can be proposed (Figure 8). 
In this model, viral genomic RNA binding by both the 
N- and C-terminal domains would lead to a clustering 


Figure 8. Hypothetical Model for the Assem- 
bly of the IBV Ribonucleoprotein Complex 


(A) Both the N- (cyan) and C-terminal (green) 
domains of the IBV-N protein can bind RNA 
(represented as a thin orange line). The basic 
patch in IBV-N29-160 is depicted by plus 
signs. Dimerization of the C-terminal 
domains (arrows) leads to a clustering of 
IBV-N proteins and to their oligomerization. 
(B) The endodomain of the integral mem- 
Cc brane protein M can provide further contacts 
to the ribonucleocapsid (see text). However, 
the precise coupling between RNA recogni- 
tion and IBV-N multimerization remains un- 
certain. 
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of N proteins. Dimerization of the C-terminal domains 
would trigger oligomerization of the N-terminal domains 
by increasing their local concentration above a certain 
threshold. In turn, this would trigger condensation of 
viral RNA. Interdomain flexibility we have defined in the 
linker regions could facilitate the necessary conforma- 
tional changes during the transition to a more compact 
form of the ribonucleocapsid (Figure 8). 

Further studies are underway to elucidate the three- 
dimensional structure of the globular C-terminal domain 
of IBV-N, to define the interactions between the IBV-N 
protein and viral RNA, and to characterize the morphol- 
ogy of the ribonucleocapsid. 


Experimental Procedures 


Cloning and Expression 

The gene encoding the IBV-N protein was amplified by PCR using 
the Pfu polymerase (Stratagene, Singapore) with the forward (5/-AT 
TATT CAT ATG GCA AGC GGT AAA GCA GC-3’) and reverse primer 
(5'-ATTATT CTC GAG TCA AAG TTC ATT CTC TCC TA-3’) and 
cloned into the pET 29b vector using T4 ligase (Research Biolabs, 
Singapore). The underlined sequences correspond to Ndel and 
Xhol sites, respectively. Proteins (lacking the Hisg tag due to the in- 
sertion of a stop codon in the reverse primer) were expressed in 
E. coli BL21(DE3). The cells were grown at 37°C in Luria-Bertani me- 
dium containing 100 ug/ml ampicillin until the culture reached an 
ODeoo0 of 0.7. Protein expression was induced by the addition of 
1 mM isopropyl-f-D-thiogalactopyranoside for 3 hr at 30°C. Cells 
harvested and resuspended at 4°C in a buffer containing 20 mM 
Na3PO, (pH 7.8) were lysed by sonication and the remaining insolu- 
ble material was removed by centrifugation at 20,000 x g for 20 min 
at 4°C. N- and C-terminal fragments of the IBV-N gene coding for 
residues 29-160 and 218-329, respectively, were cloned into pET- 
16b using the following primers: 5‘-AATA CATATG TCT TCT GGA 
AAT GCA TCT TG-3’; 5’‘-AATA CTC GAG TCA CAG GGG AAT GAA 
GTC CC-3’ and 5’-A AATA CAT ATG AAG GCA GAT GAA ATG 
GC-3’; 5'-AA ATA CTC GAG TCA CGT TCC TAC ACC ATC GAC-3’. 
These two proteins (hereafter named IBV-N29-160 and IBV-N218- 
329, respectively) were expressed as described above for IBV-N, 
yielding truncated fragments having a His; tag at their N terminus 
followed by a Factor Xa cleavage site. The His;. tags were cleaved 
during purification. Expression of the selenomethionylated protein 
IBV-N29-160 was carried out as described in Doublié (1997). 


Analysis of the Proteolytically Stable Fragment Derived 

from IBV-N 

Automated N-terminal amino acid sequence determination of the 
proteolytic fragment obtained by degradation of IBV-N was per- 
formed using an Applied Biosystems (Singapore) Procise se- 
quencer. The molecular mass of purified proteins was analyzed us- 
ing a MALDI-TOF mass spectrometer (API 300 MS/MS; Applied 
Biosystems). 


Protein Purification 

The IBV-N protein precipitated with ammonium sulfate at 30% satu- 
ration, was centrifuged and resuspended in PBS, dialyzed against 
buffer A (20 mM HEPES, 1 mM EDTA, 1 mM DTT [pH 6.8)]), and loaded 
onto a cation exchange chromatography column (Mono S HR 5/5; 
GE Biosciences, Singapore) preequilibrated with buffer A. Elution 
was carried out using an NaCl gradient of buffer B (20 mM HEPES, 
1 mM EDTA, 1 mM DTT, 1 M NaCl [pH 6.8]). Fractions containing 
the protein—as shown by SDS-PAGE—were pooled and concen- 
trated to 10-15 mg/ml by ultrafiltration using a Centriprep device 
(Millipore, Singapore) with a molecular weight cutoff of 10 kDa. 
Size exclusion chromatography (Superdex 75; Amersham) was car- 
ried out in a buffer containing 20 mM Tris-HCl (pH 8.0), 150 mM NaCl, 
1 mM DTT, 0.1% NaNs3. The protein was concentrated to 10 mg/ml as 
determined by the Bradford assay (Bio-Rad, Singapore), using BSA 
as a Standard. The truncated recombinant IBV-N29-160 was resus- 
pended in PBS and loaded onto an Ni-NTA column (Qiagen, Singa- 
pore) preequilibrated with 20 mM KHzPO,, 50 mM NaCl (pH 7.8). Af- 


ter washing with 20 mM KH2PQ,, 1 M NaCl, 10 mM imidazole (pH 
7.2), IBV-N29-160 was eluted using a buffer containing 20 mM 
KH2POz,, 0.5 M NaCl, 0.5 M imidazole (pH 6.0). The His;, tag was re- 
moved by proteolysis with Factor Xa in a buffer containing 100 mM 
NaCl, 2mM CaClo, 10 mM Tris (pH 8.0) using a substrate enzyme mo- 
lar ratio of 50:1 for 4 hr at room temperature. The cleavage mixture 
was loaded onto a benzamidine column to eliminate Factor Xa, 
and the IBV-N29-160 protein recovered in the flow through was sub- 
jected to two final steps of purification as described above for the 
full-length IBV-N protein. Purification of the recombinant IBV- 
N218-329 was carried out using a similar protocol. Purification of 
the selenomethionine-substituted protein was performed using the 
same protocol as the native protein. 


Crystallization of IBV-N29-160 

Crystals of the recombinant IBV-N29-160 were grown at 18°C by va- 
por diffusion using the hanging drop method. Two microliters of the 
protein at a concentration of 10 mg/ml was mixed with an equal vol- 
ume of the precipitating solution from the well (0.1 M sodium sulfate, 
20% PEG 3350), yielding plate-shaped crystals growing to maxi- 
mum dimensions of about 0.3 x 0.3 x 0.05 mm? in about 2 weeks 
(Figure 1). Crystals of the selenomethionine protein were obtained 
under the same conditions. 


Data Collection, Structure Determination, and Refinement 

For data collection, crystals were soaked in a cryoprotecting solu- 
tion (25% glycerol, 0.1 Msodium sulfate, 20% PEG 3350 [pH 6.5]) be- 
fore being mounted and cooled to 100 K in a nitrogen gas stream 
(Oxford Cryosystems, Oxford, UK). Diffraction intensities at three 
wavelengths (Table 1) were recorded from a selenomethionine 
(SeMet)-derivatized IBV-N29-160 crystal on beamline NW12 at the 
Photon Factory (Tsukuba, Japan) on an ADSC charge-coupled de- 
vice (CCD) detector (ADSC Corporation, Poway, CA) using an atten- 
uated beam of dimensions 0.1 x 0.1 mm? (Table 1). Integration, scal- 
ing, and merging of the intensities were carried out using programs 
from the CCP4 (1994). The six selenium atoms present within the two 
molecules of the asymmetric unit were located using the program 
SOLVE (Terwilliger, 2003). An initial electron density map was calcu- 
lated and modified using the program RESOLVE (Terwilliger, 2003), 
using these selenium atom positions to locate the noncrystallo- 
graphic symmetry (ncs) axis relating the two molecules in the asym- 
metric unit, and model building was first carried out in this map using 
the program O (Jones et al., 1991). For subsequent cycles, electron 
density maps were calculated using partial model phases combined 
with experimental MAD phases with the program REFMAC5 from the 
CCP4 (1994), which was used for the initial refinement of the struc- 
ture, that included ncs restraints. A few cycles of refinement using 
molecular dynamics with a slow cooling protocol using a maximum 
likelihood target incorporating phase probability distribution en- 
coded in the form of Hendrickson Lattman coefficients were subse- 
quently carried out using the program CNS (Brunger et al., 1998), 
with ncs restraints. A data set for the native protein was collected 
on an R axis IV++ image plate detector using CuKza radiation from 
a Micromax-007 rotating anode (Rigaku/MSC, The Woodlands, TX) 
operating at 20 mA and 40 kV (Table 1). The SeMet model was placed 
in the native crystal form and adjustments to the model were carried 
out using difference Fourier maps calculated with REFMACS5, which 
was used for refinement. Superposition of structures and rms devi- 
ation calculations were carried out using the program LSQKAB from 
the CCP4 (1994). Figures 2 and 5 were produced with the program 
PyMOL (DeLano, 2002). 


Crosslinking Experiments 

The purified recombinant proteins IBV-N, IBV-N29-160, and IBV- 
N218-329 were incubated with either glutaraldehyde or SAB 
(Sigma-Aldrich, St. Louis, MO) for 2 hr at 20°C using a constant 
amount of protein (5 1g) with increasing amounts of the crosslinking 
agent. The samples were submitted to electrophoresis on an 8%- 
15% SDS-PAGE gel and stained with Coomassie blue. 


Size Exclusion Chromatography 

A Superdex 75 10/300 GL size exclusion chromatographic column 
(Amersham) mounted on an AKTA FPLC (GE Biosciences, Singa- 
pore) was used to analyze the homogeneity and apparent 
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multimerization states of IBV-N29-160 and IBV-N218-329, respec- 
tively. Protein concentrations used were 10 mg/ml and the loaded 
sample volume was 0.1 ml. The buffer was 10 mM Tris-HCl, 0.2 M 
NaCl, 3 mM £-mercapto-ethanol (pH 7.5) and the flow rate was 0.5 
ml/min. Standard protein markers (Amersham) used for calibration 
were ribonuclease A, 13.7 kDa, elution 14.88 ml; chymotrypsinogen 
A, 25.0 kDa, 13.81 ml; ovalbumin, 43.0 kDa, 11.81 ml; BSA, 67.0 kDa, 
10.89 ml. Apparent size/molecular weights were deduced by plot- 
ting Kav versus log (MW) with Kav = (Vz — Vo)/(V;i — Vo), where V, is 
the elution volume of the protein, V; is the total column bed volume, 
and Vo is the void volume. 


RNA Binding Assay 

The full-length IBV-N protein and the IBV-N29-160 and IBV-N218- 
329 fragments were expressed in E. coli BL21 cells and purified as 
described above. The polyhistidine tags of the truncated proteins 
were removed by digestion with Factor Xa. The purified proteins 
were separated on 15% SDS-PAGE, transferred to Hybond C extra 
membrane (Amersham), and probed with digoxin-labeled RNA rep- 
resenting the negative sense of the IBV genome from nucleotides 
25,873-27,608. The probe was made by in vitro transcription using 
SP6 polymerase in the presence of digoxin according to the manu- 
facturer’s instructions (Roche, Singapore). 
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